{
 "metadata": {
  "name": "",
  "signature": "sha256:42f56a022073dc712b4838e4641f268889232ea5664c082cf55f2c48f4995d0f"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Learning Scikit-learn: Machine Learning in Python"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "IPython Notebook for Chapter 4: Advanced Features - Model Selection"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "_In the previous section we worked on ways to preprocess the data and select the most promising features. As we stated, selecting a good set of features is a crucial step to obtain good results. Now we will focus on another important step: selecting the algorithm parameters, known as hyperparameters to distinguish them from the parameters that are adjusted within the machine learning algorithm. Many machine learning algorithms include hyperparameters (from now on we will simply call them parameters) that guide certain aspects of the underlying method and have great impact on the results. In this section we will review some methods to help us obtain the best parameter configuration, a process known as model selection.\n",
      "We will look back at the text-classification problem we addressed in Chapter 2, Supervised Learning. In that example, we compounded a TF-IDF vectorizer alongside a multinomial Nai\u0308ve Bayes (NB) algorithm to classify a set of newsgroup messages into a discrete number of categories. The MultinomialNB algorithm has one important parameter, named alpha, that adjusts the smoothing. We initially used the class with its default parameter values (alpha = 1.0) and obtained an accuracy of 0.89. But when we set alpha to 0.01, we obtained a noticeable accuracy improvement to 0.92. Clearly, the configuration of the alpha parameter has great impact on the performance of the algorithm. How can we be sure 0.01 is the best value? Perhaps if we try other possible values, we could still obtain better results._"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Start by importing numpy, scikit-learn, and pyplot, the Python libraries we will be using in this chapter. Show the versions we will be using (in case you have problems running the notebooks)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%pylab inline\n",
      "import IPython\n",
      "import sklearn as sk\n",
      "import numpy as np\n",
      "import matplotlib\n",
      "import matplotlib.pyplot as plt\n",
      "\n",
      "print 'IPython version:', IPython.__version__\n",
      "print 'numpy version:', np.__version__\n",
      "print 'scikit-learn version:', sk.__version__\n",
      "print 'matplotlib version:', matplotlib.__version__\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Populating the interactive namespace from numpy and matplotlib\n",
        "IPython version: 2.1.0\n",
        "numpy version: 1.8.2\n",
        "scikit-learn version: 0.15.1\n",
        "matplotlib version: 1.3.1\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's start again with our text-classification problem, but for now we will only use a reduced number of instances. We will work only with 3,000 instances. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.datasets import fetch_20newsgroups\n",
      "\n",
      "news = fetch_20newsgroups(subset='all')\n",
      "\n",
      "n_samples = 3000\n",
      "\n",
      "X = news.data[:n_samples]\n",
      "y = news.target[:n_samples]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Then import the set of stop words and create a pipeline that compounds the TF-IDF vectorizer and the Nai\u0308ve Bayes algorithms (recall that we had a stopwords_en.txt file with a list of stop words)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.naive_bayes import MultinomialNB\n",
      "from sklearn.pipeline import Pipeline\n",
      "from sklearn.feature_extraction.text import TfidfVectorizer"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def get_stop_words():\n",
      "    result = set()\n",
      "    for line in open('data/stopwords_en.txt', 'r').readlines():\n",
      "        result.add(line.strip())\n",
      "    return result\n",
      "\n",
      "stop_words = get_stop_words()\n",
      "\n",
      "clf = Pipeline([\n",
      "    ('vect', TfidfVectorizer(\n",
      "                stop_words=stop_words,\n",
      "                token_pattern=ur\"\\b[a-z0-9_\\-\\.]+[a-z][a-z0-9_\\-\\.]+\\b\",         \n",
      "    )),\n",
      "    ('nb', MultinomialNB(alpha=0.01)),\n",
      "])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If we evaluate our algorithm with a three-fold cross-validation, we obtain a mean score of around 0.81.\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.cross_validation import cross_val_score, KFold\n",
      "from scipy.stats import sem\n",
      "\n",
      "def evaluate_cross_validation(clf, X, y, K):\n",
      "    # create a k-fold croos validation iterator of k=5 folds\n",
      "    cv = KFold(len(y), K, shuffle=True, random_state=0)\n",
      "    # by default the score used is the one returned by score method of the estimator (accuracy)\n",
      "    scores = cross_val_score(clf, X, y, cv=cv)\n",
      "    print scores\n",
      "    print (\"Mean score: {0:.3f} (+/-{1:.3f})\").format(\n",
      "        np.mean(scores), sem(scores))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "evaluate_cross_validation(clf, X, y, 3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[ 0.812  0.808  0.822]\n",
        "Mean score: 0.814 (+/-0.004)\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It looks like we should train the algorithm with a list of different parameter values and keep the parameter value that achieves the best results. Let's implement a helper function to do that. This function will train the algorithm with a list of values, each time obtaining an accuracy score calculated by performing k-fold cross-validation\n",
      "on the training instances. After that, it will plot the training and testing scores as a function of the parameter values."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def calc_params(X, y, clf, param_values, param_name, K):\n",
      "    # initialize training and testing scores with zeros\n",
      "    train_scores = np.zeros(len(param_values))\n",
      "    test_scores = np.zeros(len(param_values))\n",
      "    \n",
      "    # iterate over the different parameter values\n",
      "    for i, param_value in enumerate(param_values):\n",
      "        print param_name, ' = ', param_value\n",
      "        \n",
      "        # set classifier parameters\n",
      "        clf.set_params(**{param_name:param_value})\n",
      "        \n",
      "        # initialize the K scores obtained for each fold\n",
      "        k_train_scores = np.zeros(K)\n",
      "        k_test_scores = np.zeros(K)\n",
      "        \n",
      "        # create KFold cross validation\n",
      "        cv = KFold(n_samples, K, shuffle=True, random_state=0)\n",
      "        \n",
      "        # iterate over the K folds\n",
      "        for j, (train, test) in enumerate(cv):\n",
      "            # fit the classifier in the corresponding fold\n",
      "            # and obtain the corresponding accuracy scores on train and test sets\n",
      "            clf.fit([X[k] for k in train], y[train])\n",
      "            k_train_scores[j] = clf.score([X[k] for k in train], y[train])\n",
      "            k_test_scores[j] = clf.score([X[k] for k in test], y[test])\n",
      "            \n",
      "        # store the mean of the K fold scores\n",
      "        train_scores[i] = np.mean(k_train_scores)\n",
      "        test_scores[i] = np.mean(k_test_scores)\n",
      "       \n",
      "    # plot the training and testing scores in a log scale\n",
      "    plt.semilogx(param_values, train_scores, alpha=0.4, lw=2, c='b')\n",
      "    plt.semilogx(param_values, test_scores, alpha=0.4, lw=2, c='g')\n",
      "    \n",
      "    plt.xlabel(param_name + \" values\")\n",
      "    plt.ylabel(\"Mean cross validation accuracy\")\n",
      "\n",
      "    # return the training and testing scores on each parameter value\n",
      "    return train_scores, test_scores"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's call this function; we will use numpy's logspace function to generate a list of alpha values spaced evenly on a log scale."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "alphas = np.logspace(-7, 0, 8)\n",
      "print alphas"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[  1.00000000e-07   1.00000000e-06   1.00000000e-05   1.00000000e-04\n",
        "   1.00000000e-03   1.00000000e-02   1.00000000e-01   1.00000000e+00]\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "train_scores, test_scores = calc_params(X, y, clf, alphas, 'nb__alpha', 3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "nb__alpha  =  1e-07\n",
        "nb__alpha"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  1e-06\n",
        "nb__alpha"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  1e-05\n",
        "nb__alpha"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  0.0001\n",
        "nb__alpha"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  0.001\n",
        "nb__alpha"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  0.01\n",
        "nb__alpha"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  0.1\n",
        "nb__alpha"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  1.0\n"
       ]
      },
      {
       "metadata": {},
       "output_type": "display_data",
       "png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEVCAYAAAD3pQL8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8HOWd5/FPq3W1ZR22jA+MjUC2ObwGDMRADKGJOYxt\n4k0yCYEMyRA2IZMldyYM2eRlkUx2k8k1Q641dw7CMbMhAcTpgIghxMYGDGSMkXyAL2xsWYelltRH\n7R9PtdSSWlJVS9XVLX3fr1e/uqu6nqqfhKifn6OeB0RERERERERERERERERERERERERERCRH3Qkc\nAF4b5phbgEZgC7A4Zf9y4A37uxu9ClBERHLLBZhkMFTiWAE8an8+B/ir/TkINAE1QBHwCnCKZ1GK\niIhjBR6ffz1wZJjvPwD8yv68AagCZgJLMIljFxAF7gNWexaliIg45nXiGMlsYHfK9h5737FD7BcR\nEZ/5nTgAAn4HICIizhX6fP29wJyU7eMwtYuiAfvn2Pv7mT691jp4cLunAYqIjEPbgXmZFva7xvEQ\n8An787lAC2YU1iZgPqZzvBi40j62n4MHt2NZVkavNWvWeFJuqO/T7fcqBsWfm/E7KTdR4k/dTiQs\nenosOjosWlos3n3XYt8+i7fesmhqsti61eLVVy02b7b4618trr12DU8/bfHEExaPPGLx4IMWDzxg\ncc89FnfdZXHrrRZr1w5+rVq1Ju1+J8dcccUa7rrLXOP++y2uvnoNW7dadHXlzu/fzd8OUDuaG7fX\nNY57gQuBaZg+izWY2gTAWsyIqhWYjvAO4Fr7uxhwA/AEZoTVHcDWsQwsHA57Um6o7zO93lifS/GP\nnld/O8MdM57jDwSgqMi8nIhEwowUSjwO0ah59fRALAZVVWHe8x6znfwuGjXfJfctXRpmxoz+30ej\nMH9+mJ4ecxxAWxv8+c/w3HMwdy7MmwfHHw/B4Mjx5/O9Z7yw8tmaNWv8DmFUFL+/8jn+fIs9Hres\nri7Lam+3rOZmy7rhhjVWfb1l3XqrZa1da1533mlZzzxjWbt3W1Yi4XfEwwOs0dx4/e7jmNDy/V8C\nit9f+Rx/vsVeUAAlJeYF8OEPm1pPJALbt0NTExw8CG++aV6hENTWmprI9Om+hu6JfB/RZCdPERF/\ntbWZBNLUBC0tffsrKkwCqa2FKVP8iy9VIBCAUdz/lThERMbYoUMmgWzfDh0dffunTeuriZSV+Ref\nEocSh4jkKMuC/ftNEtm5E7q7zf5AAGbNMknkxBP7msCyRYlDiUNE8kA8Drt3myTy9ttmNBeY/pM5\nc/pGZhVmoedZiUOJQ0TyTDQKu3ZBYyPs3WtqJmCGJNfUmCQye7ZJKl5Q4lDiEJE8FonAjh0miRw8\n2Lc/FDLNWPPmwYwZY3tNJQ4lDhEZJ9raTId6Y2P/kVnl5SaBzJs3NiOzlDiUOERkHDp8uG94b+rI\nrOrqvuG9kydndm4lDiUOERnHLAveecckkB07+kZmgRmZNW+e+5FZShxKHCIyQcTjsGePSSJvvZX5\nyCwlDiUOEZmAkiOzmprMyKxEwux3MjJLiUOJQ0QmuOTIrKYmOHCgb39paf+RWQH7jq/EocQhItKr\nvb2vU/3Ikb795eV9051UVytx+B2DiEhOSo7M2r4djh7t23/99aNLHJpWXURknKquNq8lS8zIrO3b\nTZPWaKnGISIygSQSEAyOrsbh95rjIiKSRWMx/5USh4iIuKLEISIirihxiIiIK0ocIiLiihKHiIi4\nosQhIiKuKHGIiIgrShwiIuKKEoeIiLiixCEiIq54nTiWA28AjcCNab6fAjwIbAE2AAtTvtsFvAq8\nDGz0NEoREXHMy0kOg8A24GJgL/AicBWwNeWYHwBtwHeAk4Cf28cD7ATOApqHuYYmORQRcWm0Czk5\nqXH8mP41AaeWAE2YmkMUuA9YPeCYU4Bn7M/bgBrgmJTv8332XhGRccdJ4tgK3IppLvosUOnw3LOB\n3Snbe+x9qbYAH7I/LwGOB46zty1gHbAJ+LTDa4qIiMecJI7bgKXAJzA1gteA3wEXjVDOSRvS94Aq\nTD/GDfZ73P7ufGAxcDnwP4ELHJxPREQ85nQFwCBwMqZp6V1MTeErmBrIlUOU2QvMSdmeg6l1pGoH\nPpWyvRNIrk+1z35/F9OBvgRYP/AidXV1vZ/D4TDhcHiEH0VEZGJpaGigoaFhzM7npA/hJ8AVwNPA\n7fQf4bQN06mdTqH9/TJMEtjI4M7xSiAC9GCao5YC/wBMwiSrdqAMeBK42X5Ppc5xERGXRts57qTG\n8SrwTaAjzXfnDFMuhml+egKTBO7AJI3r7e/XAqcCd2OatV4HrrO/m4GpZSRjvIfBSUNERHzgJON8\nCFPbaLG3q4Aw8AePYnJDNQ4REZdGW+NwUnALcPqAfa8AZ2R60TGkxCEi4lI2nuNId/JgphcUEZH8\n5iRxbMY8BFgLzMN0lm/2MigREcldThLH5zFPft+Pefq7C/NchYiITED5PqWH+jhERFzKxnDc6cDX\nMUNnQ/Y+C3h/phcVEZH85aSp6h7M1OgnAnWYSQs3eReSiIjkMidVlZeAMzEPAp5m79sEnO1VUC6o\nqUpExKVsNFX12O/vAKsw04dMyfSCIiKS35wkjn/BPC3+VeCnQAXwZS+DEhGR3DVS4ggCC4BHMFOO\nhL0OSEREcttIneNxzIy2IiIigPNp1YswDwB22GUsTKe539Q5LiLiUjYmOWwg/Wp+I60AmA1KHCIi\nLmUjceQyJQ4REZeyMRx3DabGkWyiSvp2phcVEZH85SRxdNCXMEKYZzn+y7OIREQkp2VSVSnBLON6\n4RjHkgk1VYmIuJSNhZwGKgNmZ3pBERHJb06aql5L+VyAmS1X/RsiIhOUk6pKTcrnGHAAs7BTLlBT\nlYiIS9loqpoJNGOmU9+D6SA/J9MLiohIfnOScV7BTKuesLeDmGnVF3sVlAuqcYiIuJStzvFEyuc4\nJnmIiMgE5CRx7AS+gJmvqhj4IrDDy6BERCR3OUkcnwWWAnsxfRznAp/xMigREcldmqtKRGSCyUYf\nx68xKwAmTQHuzPSCIiKS35wkjtMwq/8lHcGMshIRkQnISeIIAFNTtqfifFTVcuANoBG4Mc33U4AH\ngS3ABmChi7IiIuIDJ1OO/Ah4AXgAk0Q+AnzXQbkg8DPgYkzH+ovAQ8DWlGO+gVlJ8IPAScDP7eOd\nlBURER847eP4EHAQeAdzk/+1g3JLgCbME+dR4D5g9YBjTgGesT9vw0xvMt1hWRER8YHTBwD/hqlx\nPAwcBeY6KDMb2J2yvYfBs+puwSQlMMnieOA4h2VFRMQHTpqqPoBprjoWU+s4HtNktHC4QqRfp3yg\n7wH/DryMmYX3ZcyT6Y7H2NbV1fV+DofDhMNhp0VFRCaEhoYGGhoaxux8Tsbxvgq8H3gKMz/VRcA1\nwKdGKHcuUIfp5Aa4CTN1yfeHKbMTWAT8N4dl9RyHiIhL2XiOIwocso8NYvokznZQbhMwH9NvUQxc\niengTlVpfwfwaeBZTFOYk7IiIuIDJ01VR4ByYD1wD6a56qiDcjHgBuAJTMK5A9PEdb39/VrgVOBu\nTNPU68B1I5QVERGfOamqlAFdmBrHx4EKTAI57GFcTqmpSkTEpdE2VWmuKhGRCSZb63GIiIgAShwi\nIuKSEoeIiLjiZFTV+cAazNDY5PEWcKJHMYmISA5z0jmyDfgSZjLCeMr+Q55E5I46x0VEXBpt57iT\nGkcL8FimFxARkfHFScb5HuYhvN8D3Sn7X/IkIndU4xARcSkbz3E0kH7SwYsyvegYUuIQEXFJDwAq\ncYiIuJKNBwCrgJ8Am+3XjzCTE4qIyATkJHHcCbRhloz9KNAO3OVlUCIikrucVFW2AKc72OcHNVWJ\niLiUjaaqCHBByvb5QGemFxQRkfzmJOOcAfyavn6NI8AnMbUOv6nGISLiUjZHVVXY722ZXswDShwi\nIi55+eT4NcBvgK/S/zmOgL3940wvKiIi+Wu4xDHJfi8n/QOAIiIyATmpqpwPPOdgnx/UVCUi4lI2\n+jheBhYP2PcScGamFx1DShwiIi552cdxHvBe4BjgKykXKcdMeigiIhPQcImjmL4kUZ6yvw34Oy+D\nEhGR3OWkqlID7PI2jIypqUpExKVsLOTUCfwQOBUI2fss4P2ZXlRERPKXkylH7gHewKwxXoepfWzy\nLiQREcllTqoqyRFUrwKn2fs2AWd7FZQLaqoSEXEpG01VPfb7O8AqYB8wJdMLiohIfnOSOL6LWczp\nq8BPMXNWfdnLoEREJHd5vXTscuDfMEN6bwe+P+D7acBvgZmYJPZD4G77u12Yob9xIAosSXN+NVWJ\niLjk5ZPjP035bKUcm7xTf2GEcweBbcDFwF7gReAqYGvKMXVACXATJolsA2YAMWAncBbQPMw1lDhE\nRFzyciGn5BrjJZjO8TeBRsz0I8UOzr0EaMLUHKLAfcDqAcfsp2+69grgMCZpJHldIxIREZeG6+O4\n237/R8ykhlF7+5c4m+BwNrA7ZXsPcM6AY24DnsZ0uJdj1jRPsoB1mKaqtfaxIiLiMyed41X01QbA\n3OCrHJRz0ob0DeAVIAzUAk9h1jJvB5ZiaiTH2PvfANYPPEFdXV3v53A4TDgcdnBZEZGJo6GhgYaG\nhjE7n5OmoGsxfRHJq15ob989Qrlz7eOW29s3AQn6d5A/ihm19by9/SfgRgY/YLgGOAr8aMB+9XGI\niLjkZR9H0l2YJPAg8Hv7890Oym0C5mPmuioGrgQeGnDMG5jOczCd4icBOzCLSCUnViwDLgVec3BN\nERHx2HBNVadgRkCdhWl2SvZXHGu/Xhrh3DHgBuAJzAirO+zzXW9/vxb435jEtAWTxL6OGUV1IiZJ\nJWO8B3jS4c8kIiIeGq6qchvwaUwTVbr2oIu8CMglNVWJiLiUjRUAc5kSh4iIS17OVfVhhh8Z9fth\nvhMRkXFquMRxBUocIiIygJqqREQmmGxMqw5mOvVTgdKUfd/O9KIiIpK/nDzHsRYzFcgXMBnqo8Dx\nXgYlIiK5y0lV5TVgEX0rAE4GHsfMX+U3NVWJiLiUjSfHI/Z7J2biwhhm/QwREZmAnPRxPIJZKvYH\nmGnWQTPViohMWG6rKqX2q8WDWDKhpioREZey0VT1Kmb681qgi9xJGiIi4gMnieMDmMWUHsDMePs1\nYK6XQYmISO5yW1WZD3wL+Dhmxlu/qalKRMSlbD0AWINZT+OjmNrH1zO9oIiI5DcniWMDZiGmB4CP\nYBZaEhGRCcpJVeVkzEp9uUhNVSIiLmk9DiUOERFXsjEcV0REpJcSh4iIuOIkcXwUqLA/fwt4EDjT\ns4hERCSnOUkc3wLaMLPhLgPuAH7pZVAiIpK7nAzHjdvvqzCTGz4CfMeziERkXLEsi7buNo50HeFI\n5AhHuo7QGe0kVBiirLiMSUWTKCuy3+3twgKnj5iJH5z819kL3ApcAnwPM8mh+kZEpB/Lsmjtbu1N\nDi1dLTRHmmntaiVuxUc+QYqSYElvIhmYVJLbk4omJUcHSZY5+a2XAcsxkx02ArMwCzs96WFcTmk4\nrkiWJayEqUHYCSI1USSsRNoy5cXlTAlNYUrpFKpKq5hcPJlILEJntJOOng7zHu3o/ewk0QQIECoK\nUVZUlrbmkvxcUlgy1r+CvJeN5zhqMbWOLuAizCqAvyI3ZslV4hDxSMJK0NrV2pscWrpaXCWI1ERR\nFCxyde2uWNeQSaUjat4j0QgWI///X1hQ2JtUhmoaKysqI1iQC9PvZUc2EscW4CzMfFWPAn8EFgIr\nMr3oGFLiEBmlgQki+d7a3TpkgqgoqaCqtIqpoalUlVZlnCBGG3ckGkmbVDp6Ono/98R7HJ2vJFjS\nL5GkSzKhwtC4aB7LRuJ4GViMmdgwAvw0ZZ/flDhEHIon4rR1t9Ecae6tPQyXIAIEKC8p71d7mBIy\nCSKfOq+j8eiQtZbUBDNUkkxVECggVBjimLJjuGDuBYSKQln4CcZeNmbH7QGuBj4BXGHvy94/K0TE\nlXgi3q+TOvne1t02ZIKoKKnoV3vIxwQxlKJgEZXBSipLK4c8xrKsvuaxYRJMV6zLfN/SQUtXC6sW\nrGJS0aQs/jS5wUnGWQh8FvgLcC9wImaW3O87KLsc+DfM2h23pykzDfgtMBOTxH4I3O2wLKjGIRNY\nPBGnpaulX+3BSYJI1wcxkdr3RyOeiNPe0866HetojjRTUVLBqgWrmFw82e/QXMnWJIclwALAArYB\nUQdlgvaxF2M6118ErgK2phxTZ5/7JkwS2QbMSLnOcGVBiUPGuYSVoL27nbbuNtq622jtbjXvXeY9\nXeewEoT3umPd1DfWc6jzEOXF5axcsJKKkoqRC+aIbDRVhTGjqN6yt+cCnwSeHaHcEqAJ2GVv3wes\npv/Nfz9mlBaYaU0OAzHgPAdlRcaFZN/DwOTQ1t1Ge3f7kCOHAgT6NS0lk4MShPdKCktYtWAVjzU+\nxoGOAzy87WFWLlhJVWmV36FlhZPE8WPgUkwNAEzN4z5Gnq9qNrA7ZXsPcM6AY24Dngb2AeWYebGc\nlhXJG9F4NG1iaOtu42jP0SHLBQhQXlxORUkFlaWVVJRU9L4qSyqVIHxUHCxmxfwVPN70OPuP7u9N\nHlNDU/0OzXNOEkchfUkD4E2H5Zy0IX0DeAVTq6kFngJOd1BOJOd0x7qHTA6d0c4hyxUECtImh8qS\nSiYXT1ZyyGFFwSIun385T25/kj1te3jkzUdYMX8F0yZN8zs0TzlJAJsxndO/xbSJfRzY5KDcXmBO\nyvYcTM0h1XuB79qftwM7gZPs40YqC0BdXV3v53A4TDgcdhCaSGYi0ciQyaEr1jVkuWAg2JcQ0iSH\n8fBswERVWFDIZbWXsW7HOt5qfYtH3nyEy+ddzozJM/wOrVdDQwMNDQ1jdj4nf60lwA3AUnt7PfAL\noHuEcsmayjJMU9RGBndw/xhoBW7GdIpvxvR5tDkoC+oclzFmWRad0c60yaG1q5VoYuhxIUUFRUMm\nB82rNP4lrARP73yaHUd2UFRQxPJ5y5lVPsvvsNLyelRVIfA6Zt3xTFxO35DaO4D/A1xvf7cWM5Lq\nLkyHe4H9/e+GKTuQEodkJN1IpeQrlogNWa44WExlSWXa/oZ8fRhMxo5lWTTsaqCxuZHCgkIurb2U\n4yqO8zusQbIxHPePwBfoG1WVS5Q4ZFjxRJwjXUc43HmYw5HDve/DTUMRKgz11hrKi8t7k0RlSaUm\nzJMRWZbF+rfX88ahNwgGglx84sUcX3W832H1k43EsR4zvchGoMPeZwEfyPSiY0iJQ3p1x7o51HmI\n5kgzhyOHOdR5aMgJ+UKFIapKqwbVHCpKKigOFvsQvYwnlmXxwp4XeP3g6xQEClh2wjJOmHKC32H1\nykbiCKfZZzHycxzZoMQxAVmWRXtP+6BaRLphrQECVJZWUh2qpnpSde/7RJwmQrJvw54NbDmwhQAB\nwjVh5lfP9zskIDsPAL6NeVAvYm+HMFOEiHgunoj31iBSE0W6TurCgsJBCWJqaOq4mG9J8tM5x51D\nYUEhm/dvpmFXA3ErzsnTMu0yzh1OMs5mzJPcyUbhEuA54D1eBeWCahzjSCQaGZQgWrpa0j45XVZU\n1i9BVIeqqSip0MglyUmvvPMKG/duBGDpnKUsnL7Q13iyUeMI0pc0wAzDVSOwZCy5BvXA/oh0D8kV\nBAqYUjqFqaGpVIeqmTZpGlNDUzWCSfLKGTPPIBgI8sKeF3h+9/PEEjFOn5m/zzo7SRyHMPNE/dHe\nXm3vExlRLBEzyWFAf0S6Ia/FwWKmhqb2JofqkGlq0pPTMh4smrGIwoJC1r+9ng17NxC34pw5a6SZ\nm3KTk6rKPOAe4Fh7ew9wDWYSQr+pqSqHdEY7ByWI1q7WtE1Nk4snD+qPKC8uV1OTjHtvHn6TZ3c9\ni4XFGTPPYMnsJVmPIVvTqoOZhBCgPdOLeUCJI8sSVoKuWBeRaGTQ8xGRWGTQ8cmmpoH9EXoeQiay\n7c3beWbXMySsBIumL+K8Oedl9frZTBy5SIljDMQSMSLRCJFYZMT34eZjKgmWDEoQmuJbJL1dLbtY\nt2MdCSvBqcecytI5S7NW41biUOJIqzvW7SgRRKKRYedfGihAgNLCUkJF5unqZId19aTqvFsFTcRv\nu1t389SOp4glYiyoXsCFx1+YleShxDFBEkdqE5GThJDuaemhBANBQkUhQoWhEd9LC0vVDyEyhva1\n7+PxpseJJWLUTqnlohMuoiBQ4Ok1s5U4lgI19I3CsoBfZ3rRMZTXiSPZRNQZ7aQr1tX77raJKJ3i\nYLGjRBAqCmmKDRGfHTh6gMeaHqMn3kNNVQ3LTljmaRNvNhLHb4ETMQsuxVP2fz7Ti46hnEoc0XiU\nrlhXv5t/6nayxpDcHm4W1oECBAgVmX/xTyqa1O89VBgatE/9CiL55d2Od3m08VG6493MqZjDpbWX\nevb/cTYSx1bgVJyt6JdtniaOZCJwkgS6Yl2uEgGYKTKGag5KTQShohAlwRI1EYmMc4c7D1PfWE9X\nrItjy4/lstrLKAoWjfl1spE4/gP4ImZBpVzjKnFE41HHSSASjRC34iOfNEVhQWFvDSDZgZy6ndyX\n3PbiD0JE8tuRyBHqG+vpjHYyc/JMls9bPubNydlIHA3AGZhp1ZOr/uXMtOotkRbHzUOZJAKnSUCJ\nQETGSmtXK/WN9RztOcr0sulcPu/yMX32ya9p1cEkFL9ZazetdXxwUUGR4yQQKgppVlUR8U17dzv1\njfW0dbdRHapm5YKVlBaWjsm5J/xw3Htfu7fvZm/3E6RupyYKJQIRyScdPR3UN9bT0tXClNIprFyw\nckzWkslG4jgPuAU4BTOlehA4ClRketExlFOjqkRExlokGqG+sZ7mSDOVJZWsXLBy1A/bjjZxOHnK\n5GfA1UAjUApcB/wi0wuKiIhzoaIQqxasYtqkabR2t/Lwtodp7/Z3ykCnjyc2YmoaceAuYLlnEYmI\nSD+lhaWsWrCKGWUzaO9p56FtD9Ha1epbPE4SRwemiWoL8K/AV8j/vhERkbxSHCxmxfwVzJo8i45o\nBw9te4jmSLMvsThJADXAAcyqf1/G9G38Aq3HISKSdbFEjCe3P8metj2UFpayYv4Kpk2a5uoc2RpV\nNQmYA2zL9EIeUeIQkQknnoizbsc63mp9q7cmMr1suuPy2egc/wDwMvCEvb0YeCjTC4qIyOgEC4Jc\nUnsJJ045kZ54D/Vv1rO/fX/Wru8kcdQB5wBH7O2XMZMeioiITwoCBSw7YRnzps4jmojyWNNj7G3b\nm51rOzgmCrQM2Od8sQcREfFEIBDgopqLOHnaycQSMR5vepy3W9/2/LpOEsffgI9j1uKYD/wU+IuX\nQYmIiDOBQIAL5l7AwmMWErfiPLn9SXYe2enpNZ0kjs8DCzETHN4LtAFfcnj+5cAbmOdAbkzz/dcw\nTV8vA68BMaDK/m4X8Kr93UaH1xMRmXACgQBL5y7l9Bmnk7AS/Gnnn2hq9m7gq5fPYwQxo7AuBvYC\nLwJXYdb3SGcVJiFdbG/vBM4ChhuorFFVIiIpNu3bxEv7XyJAgPcd/z5OmnbSoGNGO6pquFn/HsZM\nn57u5E6mVV+CedZjl719H7CaoRPH1ZgaTSo9aCgi4sLZx55NYUEhG/du5Nm3niWWiLFw+sIxvcZw\nieNcYA/mZr7B3pe8kTv5Z/5sYHfK9h7M6Kx0JgGXAZ9L2WcB6zDTnKwFbnNwTRGRCe+MmWcQDAR5\nYc8LPL/7eeJWnNNmnDZm5x8uccwCLsE0L10F1GOSyN8cnttNG9IVwHP0H721FNgPHAM8hekrWe/i\nnCIiE9aiGYsoLChk/dvr+euevxJPxFk8a/GYnHu4xBEDHrNfJZjk8SzmuY6fOTj3XszT5klzMLWO\ndD7G4Gaq5NMs7wIPYpq+BiWOurq63s/hcJhwOOwgNBGR8e+UY04hWBDk1v+8lYc3P8zMyTOZXTF7\n1OcdqQ+hFFiJubHXYJ4YvxOTFEZSiOkcX4ZZr3wj6TvHK4EdwHFAxN43CdO53g6UAU8CN9vvqdQ5\nLiIygu3N23lm1zMkrASLpi/ivXPfCx51jv8GMwz3UeDbmOGybsSAGzBTlQSBOzBJ43r7++Sar//d\nPiaSUnYGppaRjPEeBicNERFxoHZqLcGCIOt2rOO1g25v5YMNl3ESmCnV07HQCoAiInlld+tuntrx\nFNedeR1M5DXHlThERJzb174v2c+hxCEiIs5kY1p1ERGRXkocIiLiihKHiIi4osQhIiKuKHGIiIgr\nShwiIuKKEoeIiLiixCEiIq4ocYiIiCtKHCIi4ooSh4iIuKLEISIirihxiIiIK0ocIiLiihKHiIi4\nosQhIiKuKHGIiIgrShwiIuKKEoeIiLiixCEiIq4ocYiIiCtKHCIi4ooSh4iIuKLEISIirihxiIiI\nK0ocIiLiiteJYznwBtAI3Jjm+68BL9uv14AYUOWwrIiI+MDLxBEEfoZJAKcCVwGnDDjmh8Bi+3UT\n0AC0OCyb9xoaGvwOYVQUv7/yOf58jh3yP/7R8jJxLAGagF1AFLgPWD3M8VcD92ZYNi/l+x+f4vdX\nPsefz7FD/sc/Wl4mjtnA7pTtPfa+dCYBlwH/L4OyGcn0P/xI5Yb6fiz/0EZzLsU/el797Qx3jOIf\n/bnGe/zZiD3Jy8RhuTj2CuA5TDOV27IZyef/eBP1xuukbC7HP95vXMMdo/hHf65c+NvPhnOBx1O2\nb2LoTu4HgY9lULYJk2T00ksvvfRy/moiRxUC24EaoBh4hfQd3JXAYSCUQVkRERlnLge2YbLbTfa+\n6+1X0ieB3zksKyIiIiIiIiIiIiIiMp6dD/wSuA143udY3AoA3wVuAT7hcyyZCAPrMb//C/0NJWNl\nwIvASr8DycDJmN/9A8B1PseSidXArZiHfi/xOZZMnADcDvyH34G4VAb8CvO7v9rnWHy3Gvi030G4\n9EHgbsx0LO/3N5SMvA94FLgTqPU5lkzdjJlHLR8TR1IBJnnkqyrMDThf5VviuIa+v/f7Rjo412fH\nvRM4gJkgcxJvAAAFFElEQVQAMZXTCRCvJv2IrWzINPYFmFrS14B/9DLAEWQa/3pgBfDPmBuwXzKN\n/xLgv4B3PY1uZKP5278CqMfBDcBDo/1/95uY+er8Mtr4c4GbnyF1to54VqLz0AWYCRBTf/AgZohu\nDVBE3zMe1wA/AY61j5uLqXb5JdPYPw58xD7+/izFms5ofvdgnr/x819dmcb/L/bnJ4A/YJoO/TDa\n3z/AHz2PcmiZxh8Avg8sy2Ks6Yz2958LNQ43P8Pf01fjuJdxoIb+P/h59H+q/J/t10B1mCfQ/VSD\n+9hDmCr6Lfhb44DM4v8g8H8x/9p9n5fBOVBDZn87YJ4vWuFNWI7V4D7+C4F/B9YCX/IyOAdqcB//\nF4BNmH6a6/FXDe7jn4r5+8+VGkkNzn6GSZgayi8ws5EPq3Ds4suadBMgnpPmuLqsROOOk9gjwP/I\nWkTuOIn/QfuVi5z+7YDpKMw1TuJ/1n7lIifx32K/cpGT+JuBz2YtIveG+hk6gU85PUmu93GkY/kd\nwCjkc+yg+P2m+P2V7/HDGP0M+Zg49gJzUrbnYLJmPsjn2EHx+03x+yvf44fx8TM4UkP/Nrp8mgCx\nhvyNHRS/32pQ/H6qIb/jh/HxM7h2L7AP6Ma0y11r78+HCRDzOXZQ/H5T/P7K9/hhfPwMIiIiIiIi\nIiIiIiIiIiIiIiIiIiIiIiIiIiIiIjIRNQBneXDeMPDwGBwzluqAr2bxeiK98nGSQ5GhjIfZS52a\nSD+r5BglDsk3NcBWzOqOr2NW6itN+f4a4GXMxG7vcXnuJcBfgJcwy/cuSHNMHfAb+7g36b92ymTM\nym9bgd+m7P8WsNGOaW2ac1YCu1K2y4C3MRPSfdou+wrwn5iFvpKSyaOBvprWNGCn/TkI/MAuvwX4\njL1/FvBn+n5P56eJSURk3KgBosBp9vb9mOV2wdxAkzfmCxi81vJIyjE3W4CLMTdq6N8MVYe54ZYA\n1Zgb/Cz7mBb6lj/9C7DULjMl5Rq/BlalufYf7HMAXEnfssdTU475DnCD/XkN8BX78zPAmfbn1MTx\nGeB/2Z9LgBcxv7+vAN+w9wcwCU/EsXxcAVBkJ/Cq/Xkz5mYI5l/gyfWS1wMV9qvN4XmrMDf2efa5\nitIcY2HW8u62X89gaiotmH/Z77OPe8WO63ng/cA/YZbnnAr8DXhkwHnvxySMBuBjwM/s/Ysw66BX\nYm7wj+PcpXb5v7O3K+yf7UXMMqFFmIS1xcU5RdRUJXmpO+VznL5aQjpu+gK+A/wJc7O9gv5NYMNJ\nDBNXKfBz4MOYWtJtQ5z3YWA5pnZyJvC0vf9u4HN22Zvp31SVFKPv/+WB574BWGy/aoF1mKR6AWZR\nn7sxzXsijilxyHgQSHm/0v58PqYW0O7iPBX01RiuHeKYALCavqaqMOZf8IEhjk/eyA9jagwfIX0y\nO2qf5xZMEkkeMxl4B1M7+PuU/YGUa+4CzrY/J2sXYPp/Pkdfy8ICTK1nLvAucLv9WjxE7CJpqalK\n8tHAG6+V8t6F6dwuBD7l8rz/CvwK+CZQP+A6qdd4FdNENQ34NubGflKauMAkr9swHfnvABuGuf79\nwAP09XWA6VjfgLnRb6CvP8JKud4P7XKfGRD37ZjmspcwSeYg8EH7/P+E6StqBz4xTEwiIjJKa9Dz\nEzLBqalKxD09QyET2lDtsiLjxT8AXxywbz7QOGDfc8DnsxGQiIiIiIiIiIiIiIiIiIiIiIiIiIwn\n/x/m9Scc2nLqZwAAAABJRU5ErkJggg==\n",
       "text": [
        "<matplotlib.figure.Figure at 0x100476490>"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As expected, the training accuracy is always greater than the testing accuracy. The best results are obtained with an alpha value of 0.1 (accuracy of 0.81):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print 'training scores: ', train_scores\n",
      "print 'testing scores: ', test_scores"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "training scores:  [ 1.          1.          1.          1.          1.          1.\n",
        "  0.99683333  0.97416667]\n",
        "testing scores:  [ 0.77133333  0.77666667  0.78233333  0.79433333  0.80333333  0.814\n",
        "  0.80733333  0.74533333]\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We created a very useful function to graph and obtain the best parameter value for a classifier. Let's use it to adjust another classifier that uses a Support Vector Machines (SVM) instead of MultinomialNB:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.svm import SVC\n",
      "\n",
      "clf = Pipeline([\n",
      "    ('vect', TfidfVectorizer(\n",
      "                stop_words=stop_words,\n",
      "                token_pattern=ur\"\\b[a-z0-9_\\-\\.]+[a-z][a-z0-9_\\-\\.]+\\b\",         \n",
      "    )),\n",
      "    ('svc', SVC()),\n",
      "])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "gammas = np.logspace(-2, 1, 4)\n",
      "\n",
      "train_scores, test_scores = calc_params(X, y, clf, gammas, 'svc__gamma', 3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "svc__gamma  =  0.01\n",
        "svc__gamma"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  0.1\n",
        "svc__gamma"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  1.0\n",
        "svc__gamma"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  =  10.0\n"
       ]
      },
      {
       "metadata": {},
       "output_type": "display_data",
       "png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEVCAYAAAD6u3K7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xd4XPd54PvvzKB3gGgkOgESBJtIsZMiCZCSSCkyKZG6\n8TobO97c65J7E+fxpjjJXT+mY2cTb7JO1vEmsZ26uV7biSRTkiWRkkiCTey9gUTvvWPQZ8794zcg\nCkHgAJgzZ8r7eR48nDk4c85LDjEvfu39gRBCCCGEEEIIIYQQQgghhBBCCCGEEEIIIYQQAecfgWbg\nzgznfA8oBW4B6z0RlBBCCPPtRH3oPy1BvAy873q8BbjoiaCEEEJ4h2yeniD+Dvj0hOclQIrRAQkh\nhJid1eT7pwG1E57XAekmxSKEEGICsxMEgGXKc82UKIQQQkwSZPL964GMCc/TXccmyc3N1crLyz0W\nlBBC+IlyIG++Lza7BfEO8DnX461AF2rW0yTl5eVommb61ze+8Q3TrzWX1+k5d6Zz5vq9p53vzn83\nb3jvvOX9m+/3Zzve1qbxox9p/OAHGl/9qvnvnbvfP29472Y7Zz7fm+44kLuQD2jbQl6sw0+AbwGZ\nwBeBbmAzsBG4hpreug011XWf65zGaa5z5MiRIwaHqk92drbp15rL6/ScO9M5c/3edMeKi4spLCyc\nNQ6jufO9W8j13Pn+zff7TzuekZHN++/DwACsXg11dd7x3oH87On53tTj3/zmNwG+OWsgTzG1/99b\naa5sKHzQkSNH8JYEL2Z28SLcvg1xcXDoEHz72/Le+TKLxQIL+Jw3u4tJBABv+Q1UzKy+XiUHqxWK\niiAoSN67QCctCCEEw8PwxhvQ1wcbN8Kzz5odkXAHaUEIIRbs/HmVHJKTYb0UvBEukiCECHAVFVBa\nqrqUiorA4iv9CsJwkiCECGD9/XDunHq8dSvExpobj/AukiCECGCnT8PgIGRkwMqVZkcjvI0kCCEC\n1P37UFsLYWGwe7fZ0QhvJAlCiADU3a3WPADs3AkREebGI7yTJAghAoymwalTMDoKy5ZBTo7ZEQlv\nJQlCiABz4wa0tEBUFOzYYXY0wptJghAigLS2wvXr6nFhIYSEmBqO8HKSIIQIEKOjqmvJ6YS1a2HJ\nErMjEt5OEoQQAeLyZejqgvh42LTJ7GiEL5AEIUQAqKuDu3dVIb49e8BmdKF/4RckQQjh54aG1II4\nUIX4Fi0yNx7hOyRBCOHnzp0Dux1SU+GZZ8yORvgSSRBC+LGyMigvh+BgNWtJCvGJuZAEIYSfsttV\nGW+AbdsgJsbceITvkQQhhB/SNCguVuMPWVmwYoXZEQlfJAlCCD90757aQjQ8HHbtMjsa4askQQjh\nZzo71ZoHUIX4wsPNjUf4LkkQQvgRp3O8EF9+PmRnmx2R8GWSIITwI9evQ1sbREfD9u1mRyN8nSQI\nIfxEc7Oq1GqxqL2lg4PNjkj4OkkQQviBkRHVtaRpajFcaqrZEQl/IAlCCD9w6RL09KgyGhs2mB2N\n8BeSIITwcTU1an9pm011LUkhPuEukiCE8GGDg3DmjHq8aRMkJJgbj/AvkiCE8GFnz0J/v9r8Z80a\ns6MR/kYShBA+6tEjqKxU24ZKIT5hBD0J4rvAKqMDEULo19sLn3yiHm/fDlFR5sYj/JOeBPEA+CFw\nGfgyEGtoREKIGWma2gBoeBhycmD5crMjEv5KT4L4EbAD+ByQDdwB/jdQZFxYQoinuXMHGhogIkLV\nWhLCKHrHIGzACqAAaAVuAf8Z+JlBcQkhptHRAVeuqMe7dkFYmLnxCP8WpOOcvwQ+BZwE/gTV1QTw\nHeChQXEJIaZwONRqaYcDCgogM9PsiIS/05MgbgP/BbBP870t7g1HCPE0165Be7vaGW7rVrOjEYFA\nTxdTNzCx7Fcc8Krrcdcsr90PlAClwNem+X4icAy4CdwFPq8jHiECTlMT3LolhfiEZ+mZOX0LeGbK\nsZvAulleZ0N1QT0P1ANXgM+gZkWNOQKEAn+IShYPgRRgdMq1NE3TdIQqhP8ZGYE33lBTW599FjZu\nNDsi4SssanHMvFfI6GlBTHdxPdVeNgNlQBUwAvwUODjlnEZgbCv1GKCdJ5ODEAHtwgWVHBITVYIQ\nwlP0JIhrqMVyuUAeatD6mo7XpQG1E57XuY5N9CPUIrwGVEvlt3VcV4iAUV0NJSXjhfisUvtAeJCe\nQerfAr7O+JTWj4D/R8fr9PQJ/RGqu6oQlYA+QnVn9U498ciRI48fFxYWUlhYqOPyQviugYHxQnxb\ntkB8vLnxCO9XXFxMcXGx265nZPWWragxhv2u538IOFHTY8e8j5o6e971/ARqMPvqlGvJGIQIOMeP\nqxZEWhq8/LLUWhJzt9AxCD0tiGTg94GVQLjrmAbsmeV1V4FlqNXXDcCnUYPUE5WgBrHPowan84EK\nHTEJ4ddKSlRykEJ8wkx6ejR/jPogX4pqEVTx5G/40xkFfhM4DtxHdVE9AL7k+gL4r8BG1PjDx6hE\n1KE3eCH8UU+PGpgGeO45iIw0Nx4RuPT8XnIdeBa1YG6t69hV1Ae7p0gXkwgImgbvvqvWPeTmwt69\nZkckfJknupiGXX82Aa+guotkuEwIA9y6pZJDRIRqPQhhJj0J4tuo1dO/A/w1ar3CV40MSohA1N4O\nV12dt4WFEBpqajhCzJogbMBy4BeoshqFRgckRCByOODkSXA6YdUqSE83OyIhZh+kdvDkzCMhhJtd\nuQKdnRAXp9Y8COEN9HQxnQO+j5qFZEcNeGiowWshxAI1NMDt22qVdFERBOn5qRTCA/T8V1yPSgh/\nPOW47CgnxAIND8PYwtf16yEpydRwhJhET4IoNDoIIQLVJ59AXx8kJ6sE4U3qeuroGuxiVdKqsemS\nIsDoSRDfQLUgxrqWxkxtUQgh5qCyEh49Ul1K3laIr7a7luPlx3FqThxOB8+kTq34LwKBnv+SdtdX\nH6qW0suo8hlCiHnq74ezZ9XjrVshNtbceCbqGOjgROUJnJoTgCsNV2i1t5oclTDDfNqNocCHwG43\nxzITWUkt/MqxY1BTAxkZ8NJLZkczrn+kn6MlR+kb7iMvIY+woDDuttwlNjSWQwWHCLbJVna+xBMb\nBk0VyZP7OgghdHrwQCWHsDDY7clfs2Yx6hzleNlx+ob7SI1KZXfWbrakbWFR+CK6h7r5pPYTs0MU\nHqYnQdyZ8HUPtS3o/zAyKCH8VXf35EJ8ERHmxjNG0zROVZ6itb+VmNAYXsx9EZvVhs1qY0/OHoKs\nQTxsf0h5R7nZoQoP0jNI/akJj0eBZtQWokKIOdA0OHUKRkdh2TJYutTsiMZdrr9MZVclIbYQ9uft\nJywo7PH34sPj2Za+jbM1Zzlbc5bkyGSiQ6NNjFZ4ip4WRCqqBHcVatvQcEDWegoxRzduQEsLREXB\njh1mRzOupK2EW823sFqsvJj7InFhcU+cU5BUQHZcNsOOYU5VnULGBAODngTxd6gZTGPsrmNCCJ1a\nW+G6q/ZAYaHaCMgb1PfUc67mHAA7M3eyJHrJU8/dnbWbyOBImvqauN4ohRQCgd5BaueExw5UET8h\nhA6jo6pryemENWtgydM/gz2qc6CTjyo+wqk5WZe6jvzE/BnPDw0KpSinCAsWrjdep7mv2UORCrPo\nSRCVwFeAYCAE+G1kW1AhdLt8Gbq6ID4eNm82OxplYGSAY2XHGHYMszR+KZuWbNL1uiXRS3gm9Rk0\nNE5WnmTYMTz7i4TP0pMgvgzsAOpRYxBbgS8aGZQQ/qK+Hu7eVauk9+wBmxe0vR1OB8fLj9M73Ety\nZDJF2UVzKqWxcclGkiOT6R3u5Wz1WQMjFWbTkyCagU8Dya6vzwAtRgYlhD8YGhovxLdhAyxaZGo4\ngGs6a9UpWuwtRIdEsy93Hzbr3LKW1WJlT84egq3BlHeW86j9kUHRCrPpSRD/C7Wj3Jh44B+NCUcI\n/3H+PNjtkJoK69aZHY1ypeEKFZ0Vj6ezhgeHz+s6MaExPJep9kQ9X3Oe7sFud4YpvISeBLEWtZvc\nmE7gWWPCEcI/lJdDWRkEB6tZS95QDPVh20NuNt3EarHywtIXiA9f2NbyyxYtIy8hjxHnCCcrTz6u\n3ST8h54EYQESJjxPQGYxCfFUdjucUzNH2bYNYmLMjQfUdNazNWq84LnM50iLcU+1nOcynyM6JJrW\n/lauNlx1yzWF99CTIP47cAH4FvBt1+M/NzIoIXyVpsHp02r8ISsLVqwwOyLoGux6PJ31mZRnWJHo\nvqBCbCHsydmDBQu3mm5R31PvtmsL8+kdgziEGphuAl5zHRNCTHH/PtTVqUJ8u3aZHc3k6aw5cTls\nTnP/PNuUqBQ2LNmAhkZxVTGDo4Nuv4cwh96FcveAfwPeRa2qzjQsIiF8VFcXXLqkHu/aBeHzG/91\nG4fTwYflH9Iz1ENSRJJa5GbQYMj61PWkRqViH7FzpvqMIfcQnqcnQRwASlGL44pRNZk+MC4kIXyP\n0zleiC8/H7KzzY1H09Rv8832ZqJCotiXt48gq57anPNjsVjYk7OHEFsIVV1V3G+9b9i9hOfoSRDf\nBrYBj4AcYC9wycighPA116+rekvR0bB9u9nRwLXGa5R3lj+ezhoRbHxd8aiQKHZlqX61i3UX6Rzo\nNPyewlh6EsQI0OY61wacAjYaGZQQvqSlRVVqtVjUlNZgkzdde9T+iOuN17FarOzN2UtCeMLsL3KT\npfFLWZG4glHnKCcqT+BwOjx2b+F+ehJEJxANnAV+DHyPydVdhQhYY4X4NA3WroXFi82Np7G38fEY\nwPaM7WTEZng8hm3p24gNjaVjoINL9dLZ4Mv0JIiDQD/wVeAYUMbkTYSECFgXL6pd4hISYKPJ7eru\nwW4+LP8Qp+ZkbcpaViatNCWOYFswe5fuxWqxcrflLjXdNabEIRZOT4Kwo0p8jwD/jGpBtBsYkxA+\nobZWTWu12cwvxDc4OsgHZR8w5BgiOy6bLWnm7umVGJH4uELs6arT9I/0mxqPmB+901yFEBMMDqoF\ncaBaDgme6+Z/wsTprIkRiWrhmhfU9libspb0mHQGRgcoriqWXeh8kCQIIebh3Dno71djDmvXmhvL\n6erTNPU1ERUSxf68/YZOZ50Li8VCYXYhYUFh1PXUcbflrtkhiTmSBCHEHJWWQkWF2jbU7EJ81xqu\nUdZRRrA1mH25+zwynXUuIoIj2J21G4BL9Zdo75feaV+iJ0E8B3yEWixX6fqSHeVEQOrrU2W8Qa13\niI42L5bS9lKuNV7DgoXnlz7Poggv2HBiGllxWaxKWoVTc3Ki8gSjzlGzQxI66UkQ/wB8F5UoNrm+\n9BZ02Q+UoJLL155yTiFwA7iLWqkthFfSNLUB0PAw5OTA8uXmxdLU12T6dNa52Jq+lYTwBLoGu7hQ\ne8HscIROehJEF6q0RjNqwdzY12xswPdRSWIlaie6ginnxAH/EzVtdjXwuq6ohTDB3bvQ0AAREbBz\np3lxjE1ndWgOVievZlXyKvOC0clmtbEnZw82i40HbQ+o7Kw0OyShg54EcQpV3nsbaqOgsa/ZbEat\nmahCTZH9KWpNxUS/AryJ2usa9CUeITyusxMuX1aPd+1S1VrNMDQ6xLGyYwyODpIVm8W29G3mBDIP\nCeEJbE3fCsCZ6jPYh+0mRyRmo2e6w1ZA48nyGkWzvC4NqJ3wvA6YOjl7GRCMSkLRwP8A/lVHTEJ4\njMMBJ0+qPwsKINOkWsZj01m7h7q9ajrrXKxKXkVdTx3V3dWcrDzJK8tf8bm/QyDRkyAK53ltPZOe\ng1Gtkb1ABGozoouoMQshvMK1a9DernaG27rVvDjO1pylsa+RyOBI9uXuI9hmctGnedqdvZs37r9B\nY18jN5tusn7xerNDEk+hJ0HEAd8AxrY/KQb+GJhtl/J6YOLIWQbjXUljalHdSgOurzPAM0yTII4c\nOfL4cWFhIYWFhTpCF2Jhmprg1i01lbWoyLxCfDcab/Co/RFB1iD25e0jMiTSnEDcICwojKLsIt4r\nfY9rjddIi0kjOTLZ7LD8QnFxMcXFxW67np623VvAHeBfXOd/FliL2mVuJkHAQ1TroAG4jBqofjDh\nnBWogex9QCiqjPinganF5DVZhSk8bWQE3nwTenpg/XrYtMmcOMo7yjlReQILFl7MfZGsuCxzAnGz\nS3WXuNV8i5jQGA4VHCLEFmJ2SH7H1X037z48PYPUuagWRAVQDhxxHZvNKPCbwHHUB/7PUMnhS64v\nUFNgjwG3UcnhRzyZHIQwxYULKjkkJsKGDebE0NzXTHFVMaCmivpLcgDYlLaJxIhEeoZ6OF9z3uxw\nxDT0ZJaLwO+hyn2DWg8xNqvJU6QFITyquhqOH1cF+A4dgvh4z8fQM9TD0ZKjDI4OsippFTsyd3g+\nCIN1D3bz5oM3GXWOUpRdxLJFy8wOya94ogXxZdRahWrX1/ddx4TwSwMDcMa1rfLmzeYkh4nTWTNj\nM9me4QXb1BkgNiyWHRkq8Z2vPU/PUI/JEYmJ9CSIm6gxhzWur3XALSODEsJMZ8+qJJGWBqtXe/7+\nTs3JRxUf0TXYRUJ4Antz9vr1VND8xHyWxi9l2DHMycqTODWn2SEJl5lmMX0WtSbhd5g8ZdXiev5d\nA+MSwhQPH0JVlbmF+M5Wn6Wht4GI4Aj25+332emsc7Ezcyct9hZa7C1cb7zOxiWyq7E3mKkFMVYW\nMnrKV5TrTyH8Sm8vfPKJevzccxBpwkzSm003edj+UE1nzd1HVEiU54MwQWhQqFr4h4UbjTdo7G00\nOySBvsGL54BzOo4ZSQaphaE0Dd59V617yM2FvXs9H0NFZwUfV3yMBQsv5L5Adly254Mw2bWGa1xr\nvEZkcCSvr3yd0KBQs0PyaZ4YpP7raY59b743FMIb3b6tkkNEhGo9eFqLvYVTlacA2JK+JSCTA8Cz\ni58lJTIF+4j9cbVaYZ6ZxiC2AduBJOA/M56FolGVWoXwC+3tcOWKelxYCKEe/qW1d6iX42XHcWgO\nViatZG2KyVvUmchisbAnZw9vPniTyq5KStpKWJG4wuywAtZMLYgQxpPB2NhDFNCDlOUWfsLhgFOn\nwOmEVasgPd2z9x92DHOs7BgDowOkx6T77XTWuYgOjWZnpqqn/kntJ3QNdpkcUeDS0zeVjSrZbSYZ\ngxCGuHhRdS/FxakFcUEe3M7ZqTn5oPQD6nvrSQhP4ED+ASk3MUFxVTGP2h+RGJHIwfyD2KzScTFX\nnhiD6Af+AngfVZb7FHByvjcUwls0NsKdO2C1qkJ8nkwOAOdqzlHfW094UDj7cvdJcphiR8YOYkJj\naOtv40rDFbPDCUh6EsSPUTWTlqLqMFUBV40LSQjjDQ+rriVNU4X4kpI8e/9bTbcoaSt5XJ01OlRm\njk8VbAtmb85erBYrt5tvU9tdO/uLhFvpSRCLgL8HhoHTwH8C9hgZlBBG++QT6OuD5GSVIDypsrOS\nS/WXACjKLpJS1zNIikx6vGiuuKqYgZEBkyMKLHoSxLDrzybgFdQGPyZUpxHCPSor4dEj1aVUVKS6\nmDyl1d7KqSrXdNa0LeTE53ju5j7qmZRnWBK9hIHRAU5XnzY7nICi50fjT1CbBv0O8Luo1sRXjQxK\nCKP096taSwBbtkBsrOfu3Tfcx/Hy44w6R1mRuIJnUp/x3M19mMVioSi7iLCgMGq6a7jbctfskAKG\nngTxLtCF2jSoENWCeMfAmIQwzJkzMDgIGRlqWqunjE1n7R/pJy06jecyTViN58MiQyLZlaU2tbxU\nd4mOgQ6TIwoMM83bmLiCWmN8qtTYfNOvGBKREAZ58ABqatRCuN27PXdfTdM4UXGCjoEO4sPieSH3\nBawWD/Zr+YnsuGxWJq3kfut9TlSc4LWC1wiyenjqWYCZ6X/pNddXKKrV8Ai1V/R61CI6IXxGd7fa\nIQ5g505VUsNTzteep7anlvCgcPbn7ZfprAuwNX0r8WHxdA52crHuotnh+D09CyguoYrzjbieB6MK\n9W0xKqhpyEI5MW+aBm+/DS0tkJcHezw4B+9O8x0u1F3AZrHxyvJXSIlK8dzN/VR7fztHS47i0By8\nmPtiwNat0sMTC+XigJgJz6Ndx4TwCTdvquQQFQU7PLhrZ1VX1ePfcotyiiQ5uMmiiEVsSVe/n56p\nPoN92G5yRP5LT4L4M+A68C+ur+vAnxoZlBDu0tYG166px7t3e64QX1t/GycrT6KhsWnJJpbGL/XM\njQPE6uTVZMZmMjg6yKmqU0gPgzH0JIh/ArYCPwfecj3+ZwNjEsItHA44eVIV4luzRm0h6gn2YTvH\nyo4x6hwlf1E+6xd7eCVegNidtZuI4Agaehu41Sy7IBthpgRR4PpzA7AYqAXqgCWoQWshvNqlS9DV\nBfHxsHmzZ+454hh5PJ11SfQSdmbt9MyNA1B4cDi7s9R0tKsNV2m1t5ockf+ZafDiR8AXgGIm70k9\npsiIgJ5CBqnFnNTXw3vvqVXSr74KiYnG31PTNI6XH6emu4a4sDgO5h+UHdE84GLdRW433yYmNIbD\nBYcDYg9vvRY6SG3CluzzIglC6DY0BG+8AXY7bNrkuVpL52vOc6/1HmFBYby64lViQmNmf5FYMIfT\nwdGSo7QPtLN80XIKswvNDslrLDRBzLTK5DDTtxzGvDXfmwphpPPnVXJISYF16zxzz7std7nXeg+b\nxcaLuS9KcvAgm9XG3qV7eevBWzxqf0RGTAa5Cblmh+UXZkoQn0IShPAx5eVQVgbBwaoQn8UDbeTq\nrmou1KpVeLuzd5MalWr8TcUkcWFxbEvfxtmas5ytOUtyZLKUUHeDmRLE5z0VhBDuYLfDuXPq8dat\nEOOBX+Lb+9sfT2fduGQjeQl5xt9UTKsgqYC6njoquyo5WXmSA/kHxrpYxDzpLWTyCrASCJtw7I/d\nH44Q86NpcPq0Gn/IzISCgtlfs1Bj01lHnCMsX7ScZxfL5D6z7craRYu9hWZ7M9cbr7NhyQazQ/Jp\netZB/AD4ZVRxPovrcZaRQQkxV/fvQ10dhIV5phDfiGOE4+XHsY/YWRy1mJ2ZMp3VG4QGhbInZw8W\nLFxvvE5TX5PZIfk0PQliO/A5oAP4JmqhXL6RQQkxF11das0DqEJ84eHG3k/TNE5WnqStv43Y0Fhe\nzH0Rm9Vm7E2FboujF7MudR0a6n0aGh0yOySfpSdBjO3x1w+kAaOAjMIJr+B0qr2lR0dh+XLI8cAG\nbRfrLlLdXU1YUBj78/bLWgcvtGHJBpIjk+kb7uNczTmzw/FZehLEL1BbjP45qvx3FfATA2MSQrcb\nN6C1FaKjYft24+93r+Ued1ruYLVYeWHpC8SGeXBLOqGb1WJlT84egq3BlHeW87Dtodkh+aS5DvGH\nub66DIhlJrJQTjyhpQXeeUcNUL/yCixebOz9artrOVZ2DA2Nouwili1aZuwNxYKVtpdyquoUwdZg\nDhUcCriE7oly37eBPwJygUE8nxyEeMLoqOpacjph7Vrjk0N7fzsfV3yMhsaGxRskOfiIZYuWsSxh\nGSPOEU5UnsCpOc0OyafoSRAHAAfwb8BV4HeBTCODEmI2Fy+qXeISEmDjRmPv1T/Sz/Hy44w4R8hL\nyJOpkz5mR+YOYkJjaOtv40r9FbPD8Sl6EkQV8B1UVdfPAGuBSgNjEmJGtbVqWqvNpnaHsxk4gWjU\nOcqxsmP0DfeRGpX6uHqo8B0hthCKsouwWqzcar5FfU+92SH5DL07p2cDXwN+CqwAfl/n6/YDJai9\nrL82w3mbULOjDum8rghQQ0NqQRyolkNCgnH3mjidNSY0Rqaz+rCUqBQ2LFYtv1NVpxgcHTQ5It+g\nJ0FcQm0WZAX+D2Az8N91vM4GfB+VJFaiWh/TrW+1oVoox/Cd6rLCJGfPQn+/GnNYu9bYe12qv0RV\nVxWhtlD25+0nLChs9hcJr7UudR2LoxbTP9LP6arTZofjE/QkiF8D1qO2Ga2Yw7U3A2WoLqoRVOvj\n4DTn/RbwBiC7fYgZlZZCRYUqxFdYaGwhvgetD7jdfFtNZ819gbgw2Ybd11ksFopyigi1hVLdXc29\nlntmh+T19CSIknleOw21C92YOtexqeccBP7W9Vzmsopp9fWpMt6g1jtEG1ios66njvO16ma7snax\nJHqJcTcTHhUVEsWurF2AWvDYOdBpckTeTe8YxHzo+bD/K+APXOdakC4mMQ1Ng+JiGB6G7GzIN7DQ\nS+dAJx9XfIxTc7I+dT3LFy037mbCFDnxOaxIXIFDc3Ci8gQOp8PskLyW3mqu81EPZEx4noFqRUy0\nAdX1BJAIvITqjnpn6sWOHDny+HFhYSGFhYXui1R4tbt3oaFB1Vjatcu4+wyMDPBB2QcMO4bJjc9l\n4xKD588K02zP2E5TXxMdAx1cqr/E9gwPLMP3gOLiYoqLi912PT2/sf8yagC5B/g68CzwLeD6LK8L\nAh4Ce4EG4DJqoPrBU87/J+Bdpt+ISFZSB6jOTnjrLXA4YN8+yDKojvCoc5RfPPoFLfYWUiJTeGX5\nKzJjyc+19bdxtOQoTs3J/rz9ZMb63/IuT6yk/joqOTyH+rD/B8bHDGYyCvwmcBy4D/wMlRy+5PoS\nYkZjhfgcDlixwrjkoGkaxVXFtNhbZDprAEmMSGRz2mYAiquK6R/pNzki76Mns9wE1gF/BtwBfgzc\nQM1s8hRpQQSgK1dUMb6YGDh8WM1eMsLl+svcbLpJiC2EV1e8KjOWAoimaXxQ9gF1PXWkx6TzUt5L\nfrULnSdaEPXAD4FPA++hivUZObgtBM3NcPOmmspaVGRccihpK+Fm083H1VklOQQWi8VCUXYR4UHh\n1PXUcafljtkheRU9H/S/jOomehFVqC8e+D0jgxKBbWREdS1pGqxbBykpxtynvqf+8V4BOzN3khYz\ndRa2CAThweHszlYlVC7XX6atv83kiLyHngSRimo5lAJFqIRx2cigRGC7cAF6eiAxETYYVBevc6CT\njyo+wqk5WZe6jvxE2SQxkGXGZrI6eTVOzcmJihOMOEbMDskr6EkQb6EGnPNQ+1OnA//byKBE4Kqu\nhpISVYCOkByoAAAanklEQVSvqAisBnRmDowMcKzsGMOOYZbGL2XTkk3uv4nwOVvStpAQnkD3UDcX\n6i6YHY5X0PPj52S8kN5fo7qXDK6+LwLRwACcOaMeb94M8fHuv4fD6eB4+XF6h3tJjkymKLvIrwYl\nxfzZrDb25uwlyBpESVsJFZ1zqSzkn/QkiGHgV4DPobYfBTBoyFAEsrNnVZJYsgRWr3b/9TVN41TV\nKVrsLUSHRLMvd59MZxWTxIfHszV9KwBnqs/QN9xnckTm0pMgfh3YBvwJah+IpcC/GhmUCDwPH0JV\nFYSEGFeI72rDVSo6KwixhbA/bz/hweHuv4nweSuTVpIdl82wY5iTlScJ5Cn2ehLEPdQucneB1agC\nfN8xMigRWHp74ZNP1OMdOyAqyv33eNj2kBtNN7BarDy/9Hniww3ovxJ+Y1fWLiKCI2jqa+JG0w2z\nwzGNngRRCDwC/ifwN6jZTLKtlnALTVNTWkdGYOlSWGbAVs8NvQ2crTkLwI6MHaTHpLv/JsKvhAWF\nqfEpLFxvvE5zX7PZIZlCT4L4LmoNxC7X14vAXxoZlAgct29DUxNERMDOne6/ftdgFx+Vq+msa1PW\nUpA03Z5VQjwpLSaNZ1Kfwak5OVl5kmHHsNkheZyeBDFWdG/MI4ytAisCRHu7KqcBsHs3hIa69/qD\no4McKzvGkGOI7LhstqRtce8NhN/buGQjSRFJ9A73Pl5UGUj0JIhrwN+jupqKXI+vGhiTCAAOh+pa\ncjph5UrIyJj9NXO6vtPBh+Uf0jPUQ1JEEnty9sh0VjFnVouVPTl7CLYGU9ZRRml7qdkheZSeBPFl\nVBXWr6C2B70H/IaRQQn/d/UqdHRAbCxs3er+65+uPk1TXxNRIVHsy9tHkFUavWJ+YsNi2ZG5A4Bz\nNefoGeoxOSLPme1XqiDU7KUVHohlJlLN1Y80NsIvfqGmsh44AMnJ7r3+1YarXG+8TogthAP5B0gI\nT3DvDURAOlFxgvLOcpIjkzmQfwCrxftrlhpdzXUUNf5gUCV+EWiGh8cL8a1f7/7kUNpeyvXG61iw\nsDdnryQH4TY7s3YSFRJFi72Faw3XzA7HI/SkwARUt9JJ1I5v7zLNlqBC6PHJJ9DXB0lJKkG4U2Nv\nI6erTwOwI3MHGbFuHtgQAS3EFqLGsrBws+kmDb0NZodkOD0ds1+f5pj094g5q6yER48gKMj9hfi6\nB7v5sPxDnJqTNclrWJm00n0XF8IlNSqVZxc/y7XGa5yqPMXrK18nNMjN0++8iJ4f0RrgElDs+rrk\nOiaEbgMDqtYSwJYtEOfGfXmmTmcdq6UjhBGeXfwsqVGp2Efsj1us/kpPgvh3wDHhuRP4N2PCEf7q\n9GkYHIT0dDWt1V3GprN2D3WTGJEo1VmF4SwWC3ty9hBiC6Gqq4oHrQ/MDskwehKEDVXRdcwQEGJM\nOMIfPXgANTVqIdzu3e4txHem+gxNfU1EBkeyL3cfwTYpNCyMFxUSxc5MtfT/Qt0FOgc6TY7IGHoS\nRBtwcMLzg65jQsyqp0ftEAfw3HMQGem+a19ruEZpRynB1mD25+0nMsSNFxdiFrkJueQvymfUOcrJ\nypM4nI7ZX+Rj9C6U+yNUFdda4A+ALxkZlPAPY4X4RkchLw9yc9137bKOMq41XlPTWZfuZVHEIvdd\nXAidtmdsJzY0lvaBdi7X+99OzHNp7Ee7/uw1IpBZyEI5HzIyovZ3uHtXtSAiI+H1191Xa6mpr4n3\nHr2HQ3OwPWM7q5MN2F1ICJ1a7a28/fBtnJqTl/Je8qrp1QtdKOcro3mSIHxATw/cu6eSw7Br1Co6\nGvbsgZQUN91jqIejJUcZHB1kdfJqtmdsd8+FhViAW023uFR/ifCgcF5f+brXbEYlCUKYrqEB7txR\nA9Fjb9PYtqFZWe4blB4aHeLth2/TNdhFZmwm+3L3yYwl4RU0TeP90vep760nIyaD/Xn7veL/piQI\nYQqHA0pLVTdSR4c6ZrOpsYbVq2GRm4cEnJqT90vfp6G3gUXhiziQf0BmLAmvYh+28+aDNxkcHWRb\n+jbWpKwxOySPJYgdQDbjK6814H/N96bzIAnCS9jtcP++mro6OKiORUSotQ0FBRBuUMu6uKqYR+2P\niAiO4LUVr8mMJeGVqruqOV5+HJvFxqsrXjV98oQnEsT/BywFbjJ5wdxvzfem8yAJwmQtLaobqbJS\n7eEAqp7SmjVqq1B3ls2Y6kbjDa40XCHIGsSB/AMkRiQadzMhFuhczTnut94nLiyOQwWHTC01v9AE\noSfyDcBKpP5SwHE6oaJCdSO1tKhjVquarrp6tfsGnmdS3lHOlYYrj6uzSnIQ3m5b+jYaexvpHOzk\nQu0FdmYZsJeuh+hJEHeBxYD/ly4UgOo6evBAzUjq71fHwsJgxQpYtcq9i91m0tzXTHFVMQBb07eS\nFSdV54X3s1lt7F26l58/+DkP2h6QHpNOTnyO2WHNi54EkQTcBy6jymyAak0cMCooYY6ODtVaKC1V\ng9AA8fGqtbBsmarC6im9Q70cLz+OQ3OwMmmlVwz4CaFXQngCW9O3cr72PGeqz5AcmeyT42Z6+qYK\nn3K82H1hzErGIAyiaVBdrRJDg6uNaLFAZqZKDGlpno9p4nRWb5oyKMRcHSs7Rk13DUuil/BLy37J\n4/+PZZqrmJfhYbWg7d49tcANIDgY8vNVYoiJMScup+bkg9IPqO+tJyE8gYP5B2U6q/BZAyMDvPng\nTfpH+tmctpl1qes8en9PJIhtwPeAAiAUVd21D/DkR4gkCDfp7h5f7Twyoo7FxKixhfx8CDG5Tu+Z\n6jOUtJUQERzBqyteJSokytyAhFigup463i99H6vFyoH8AyRHunmf3Rl4IkFcA/4Dag+IjcDngHxU\n0T5PkQSxQHV1qhupZsJWT2lpqrWQmeneEtzzdbPpJpfrLxNkDeJTyz9FUmSS2SEJ4RYX6y5yu/k2\nMaExHC447LFWsSemuQKUoloODuCfUGsiPJkgxDyMjo6vdu50lau32dSA8+rVkJBgbnwTVXRWcLn+\nMhYsFGUXSXIQfmXTkk009DbQ1t/G+drzFGYXmh2SLnoShB3VtXQL+G9AE3PLSPuBv0IlmL8HvjPl\n+/8R+H3XNXuB3wBuz+H6Yoq+vvHVzkOueWeRkeOrncPCzI1vqhZ7y+PprJvTNvvslEAhnsZmtbE3\nZy9vPniTR+2PSI9JJy8hz+ywZqXngz4baEbtIvdV1NjD3wBlOl5rAx4CzwP1wBXgM8DEPfq2oabR\ndqOSyRFg6qbC0sWkQ3OzWu1cVTW+2jk5Wa12zskxdrXzfPUO9XK05CgDowMUJBb49KIiIWZT0lbC\nmeozhNhCOFxwmOjQ6NlftACemsUUAWSgPuznYhvwDdQHP4x3S/3ZU86PB+4A6VOOS4J4CodjfLVz\na6s6ZrWq8herV6sE4a2GHcO8XfI2nYOdpMeksz9vP1aLF2YxIdzo44qPqeisICUyhQP5Bwyd+uqJ\nMYgDwJ+jupmygfXAN9G3UC4NtQvdmDpgywzn/5/A+zquG/AGBlQX0v37k1c7j3UjeWq183w5NScf\nV3xM52An8WHxPL/0eUkOIiDszNxJi72FZnsz1xqvsXHJRrNDeio9CeII6kP9lOv5DVTxPj3m8mt/\nEfDrqMqxTwZx5Mjjx4WFhRQWFs7h0v6jvV11I5WXj692TkhQ3Uh5eWoQ2hecrzlPXU8d4UHh7M/b\nT4jN5Pm1QnhIaFAoRdlF/OLRL7jReIO06DQWRy92y7WLi4spLi52y7VAX9PjEipB3EC1HkANIq/V\n8dqtqAQz1sX0h4CTJweq1wJvuc6bbmwjoLuYNE2NK9y9C42N6pjFojbjWb1abc7jS2433+Zi3UWC\nrEG8svwVj84LF8JbXG24yvXG60SFRHG44DChQW7ak3cCT3Qx3UPNNAoClgFfAT7Ref2rrtdko4r9\nfRo1SD1RJio5/Cr6Br4DxtDQ+GrnXtdO4CEhakHbqlXmrXZeiKquKi7VXQKgKLtIkoMIWM8ufpb6\nnnqa7c2crTnL80ufNzukJ+jJLJHA/wu86Hp+HPgWMKjzHi8xPs31H4A/Bb7k+t4PUFNfXwPGlnCN\nAJunXCOgWhBdXaq18OiRWssAEBurWgvLl6uSGL6o1d7Ku4/eZdQ5akrZASG8Te9QL28+eJNhxzC7\ns3aTn5jv1utLLSY/oWnjq51rJwzrp6erxJCR4R2rneerb7iPoyVH6R/pZ0XiCnZl7TI7JCG8QllH\nGScrTxJkDeJwwWFiw2Lddm0jE8S7qEHm6c7xdLlvv00QIyPjq527utSxoKDx1c7x8ebG5w4jjhHe\nfvg2HQMdpEWn8dKyl2TGkhATjG2pmxiRyMH8g9is7pltYuQYxFbUtNSfoAaqJ97IPz+tPai3V40t\nlJSoyqoAUVHj01RD3T9eZQpN0/i44mM6BjqIC4uT6axCTGNHxg6a+ppo62/jSsMVtqZPXStsjpky\nSxDwAmpQeQ3wHipZ3PNAXFP5TQuisVG1FqqqVLcSQGqqai1kZ3vnaueFGNufNywojNdWvGb4ylEh\nfFWLvYV3Hr6DU3Py8rKXSY+Zul547jw1BhGKShR/gZq2+v353nCefDpBOBxq3cLdu9DWpo5N3Ns5\nyU/q0jmcDtr622i2N9Pc10yzvZn+kX5sFhuvLH+FlCgPbGIthA+70XiDKw1XiAiO4HDBYcKDwxd0\nPaMTRBjwS6hy39nAO8A/ouoqeZJPJoiBAbXS+f599RggPFx1Ia1cCRER5sa3UAMjA5OSQau9FYfm\nmHROeFA4O7N2kh2XbU6QQvgQTdN4r/Q9GnobyIzNZH/e/tlfNAMjE8S/AqtQpS9+hqqRZBafShCt\nraq1UF4+XjRv0SK12jk313dWO0+kaRodAx2TEkLPUM+kcyxYiA+PJyUyhZSoFFIiU9w6I0OIQGAf\ntvPG/TcYcgyxI2MHq5JXzftaRiYIJ6rU93Q0ZEe5STQNKitVYmhqUscsFjWusHo1LHbPSnqPGXYM\nq3oxrmTQ3NfMiHNk0jnB1uDHiSAlKoXkyGQpmSGEG1R2VvJRxUfYLDZeK3iNhPD5bd4i6yBMNjSk\nZiLdu6f2YQC12nnFCrXaOdpHxmS7B7sntQ46BzrRpkxWiwmNmdQ6SAhP8Pgm7EIEirPVZ3nQ9oD4\nsHgOFRya19RXSRAm6exUrYXS0vHVznFxKil4+2pnh9NBa3/r42TQ1NfE4OjkhfE2i43EiMRJLYSI\nYB8fNBHCh4w6R3nrwVt0DXaxKmkVOzKnrWM6I0kQHg1CrXK+e1eteh6TkaG6kdLTvXO1s33YPql1\n0NbfhlNzTjonIjhiUusgMSLRbYt1hBDz097fztGSozg0B/ty95EVlzWn10uC8ICREVUX6e5d6O5W\nx4KCVEth9WrVcvAWmqbRPtA+qXXQN9w36RwLFhLCEx4ng9SoVFmfIISXutN8hwt1FwgLCuP1la/P\nqSUvCcJAT1vtvGqVGmPwhtXOQ6NDjzcfaeprosXewqhzdNI5IbaQSa2D5Mhkgm1e3AcmhHhM0zSO\nlR2jtqeWtOg0Xl72su6xP0kQBmhoUK2F6urJq53XrFGzkszqRtI0je6h7kkzizoHO584LzY0dlLr\nIC4sTgaThfBhAyMDvHH/DQZGB9iStoVnUp/R9TpJEG7icEBZmUoM7e3qmM02vto5MdHQ209r1DlK\nq72Vpr4mmu3NtNhbnhhMDrIGkRSRNGkwOSwozPPBCiEMVdtdywdlH2C1WDmYf5CkyNlLMEiCWKD+\n/vHVzoOuz97wcLXSeeVK9dhT+ob7JrUO2gfanxhMjgyOnNQ6WBSxSIrfCREgLtRe4E7LHWJDYzlU\ncGjWrmJJEPPU2qr2dq6oGF/tnJioupGWLjV+tbNTc9Le3/64ddDc14x9ZPK6RKvFSkJ4AqlRqY9b\nB1EhUcYGJoTwWg6ng6MlR2kfaCd/UT67s3fPeL4kiDlwOsdXOzc3q2MWC+TkqG6k1NQF3+KpBkcH\nJ7UOWvtbnxhMDrWFTuoqSopIksFkIcQknQOd/Lzk54w6R9mbs5fchNynnisJQofBQXjwQHUj2V2/\npIeGjq92jnLzL+WaptE12DWpddA91P3EeXFhcZNaB7GhsTKYLISY1YPWB5ytOUuILYTDBYefOk1d\nEsQMOjpUa6GsbPJq5zVr1I5tQTNtlzQHI46Rx1NNx1oJw47hSecEWYNIjkyeNN00NMgL5skKIXzS\nh+UfUtVVRWpUKp9a/qlpf7mUBPHEiVBToxJD/YSi5JmZqhspLW3h01R7h3ontQ46BjqeqFsUFRI1\nqXWQEJ4gg8lCCLcZGh3ijftvYB+xs2HxBjYs2fDEOZIgXEZGxovm9biqUAcHj692jp1n1emnbYIz\nkdViVXWLJrQOIkMi53dDIYTQqaG3gfcevQfAgfwDT2zKFfAJoqdHtRYePRpf7RwdPb7aOWSO1af1\nbIITFhQ2qXWQFJEkdYuEEKa4XH+Zm003iQ6J5vDKw5NK7gdsgqivV4mhpmZ8tfOSJaq1kJWlrxtJ\nNsERQvg6p+bknYfv0GJvITc+l71L9z7+XkAliNHR8dXOHR3qGzYb5OWpxLBo0cwXkU1whBD+qGeo\nhzfvv8mIc4TC7EKWL1oOBFCCuHRJo6RkfLVzRIRa6VxQ8PTVzrIJjhAiUJS2l3Kq6hTB1mAOFRwi\nNix2wQnCTRM9jXfzpvozKWl8tbN1wqQg2QRHCBHIli1aRm1PLWUdZZysPMnBFQcXfE2fSRBjRfNS\nXIP0sgmOEEJM9lzmc48rNVxtuLrg6/lKX4rWam+VTXCEEGIWzX3NvPPwHQC+uPGLEAhdTG89eGvS\nc9kERwghnpQSlcKGJRvc0oLwmQQhm+AIIYQ+61PX0z34ZP23ufKVT1hT96QWQghftNBZTFIcSAgh\nxLQkQQghhJiWJAghhBDTkgQhhBBiWkYniP1ACVAKfO0p53zP9f1bwHqD4xFCCKGTkQnCBnwflSRW\nAp8BCqac8zKQBywDvgj8rYHxCJMUFxebHYKYJ3nvApuRCWIzUAZUASPAT4GpxUEOAP/ienwJiANS\nEH5FPmR8l7x3gc3IBJEG1E54Xuc6Nts56QbGtCDu/GGZ77Xm8jo95850zly/580fJu6OzRvev/l+\nf67HvYH87M3+PSPePyMThN6VbVMXcXjtijj5Tzrz9wLlA2Yh15MEMT/yszf794x4/4xcSb0VOIIa\ngwD4Q8AJfGfCOX8HFKO6n0ANaO8GmqdcqwzINShOIYTwV+WocV6vE4QKLhsIAW4y/SD1+67HW4GL\nngpOCCGEuV4CHqJaAH/oOvYl19eY77u+fwt41qPRCSGEEEIIIYQQQgghhBBCPOkg8EPUDKgXTI5F\nzF0O8PfAv5sdiJiTSNTi1h8Cv2JyLGJuAvJnLg71lxa+KaD+s/qBzwK/5Hr805lOFF5L98+cN1Rz\n/UfUuoc7U47rKfQH8F9QM6GEORb6/gnzzeU9nFj9wOGR6MRM/P7nbyeqiuvEv6ANNfU1GwhmfA3F\nZ4G/BJagFvl9B9jrwVjFk+b7/o2RFoT55vIe/irjLYifeC5E8RRzee/G+NzPXDaT/4LbgGMTnv+B\n62uirwBXURVgv4QwUzZzf/8SUCvpff43HD+Rjb73MAL1W+vfoCo0C/Nlo++9m/PPXJB74nO76Yr4\nbZlyzvdcX8L76Hn/OoAveywiMVdPew/7gV83JSKh19Peuzn/zHnDGMR0vLZgn9BF3j/fJ++h73Lb\ne+etCaIeyJjwPAOVBYVvkPfP98l76Lv87r3LZnIfmp5Cf8J7ZCPvn6/LRt5DX5WNH793PwEagCFU\nv9l/ch2frtCf8D7y/vk+eQ99l7x3QgghhBBCCCGEEEIIIYQQQgghhBBCCCGEEEIIIYQQQgghhBDC\nv3we+GuzgxD+zVuL9QkhZibVVoXhJEEIs0QC76EKid0BfhnYB/zbhHMKgXddj/cD11znfzzHeyUB\nHwF3gR8BVajNUwB+jtp46i7whQmv6QP+m+v4R8BW4DSqCNqnXOd8HjgKfAhUAr8J/C5wHbgAxLvO\n+wJw2RX7G0D4lPisrtfHTjhW6or7U8BF1zU/ApKn+fv9M3B4Suxjfs9171vAEdex6f7thRDCaxwG\nfjjheQxqq8Rqxj9A/xb4FdQHZQ2Q5ToeN8d7fZ/xHbT2AU7GE8TYh3g46sNy7LnTdS7AW6gkYAPW\nAjdcxz+P+iCPBBKBbuCLru99F/ht1+OxewF8C5VIpvor1/VAbe7yoevxxL/r/wX8xYR7j3Ux/ROT\nE0Sv688XgR+4HltRyXYncIgn/+2FeIK0IIRZbgMvAH8GPAf0AA7UVokHUCWLXwbeZvy392rXa7vm\neK8dwE9dj48DnRO+99uo36QvoOrmL3MdH3adCypxnHLFdxdVRnnMKcAOtLnienfCa8bOWwOcdf2d\n/yOwapoYfwZ82vX4P7ie44rpQ9drfxdYOdtfdoIXXV83UK2vfCDPFdvUf3shniAJQpillPHN1r8N\nfN11/KeoLo8i4Arqw1cDLAu833SvLwT2ohLQOtQHaZjreyMTznOiEsbY44lb9Q5NOW/suTbhvH8G\n/m9U6+ObE+4x0UXUh3cicBDVagHVSvie67Vf4snuKYBRxn+Wrag9AMb8KerfeT2wHNXaeNq/vRCT\nSIIQZlkMDAI/RnWbPOs6fsb1+AuM/9Z/CdjF+G/kE7ts9DjPeD/7i4x3I8WgWhODwApUojBCFNAE\nBAO/+pRzNNR4yF8C9xlv5cSg6v3DeBfUVFXABtfjA677gGoB/TqqCwzUXsVJPP3fXohJgmY/RQhD\nrAH+nPHfzn/DddwB/AL4NeBzrmOtqL79t1C/1DQzPj6gxzdRG6t8FtWV1ITqpz+G2sT9PmpzlQsT\nXjN1lpA2zWPtKcenfu/rqCTX6voz6ilx/gzVavq1CceOAP+OShgnGR+HmXj9H6G64m66/k5jg9Qf\noXYSG/t79aL+DfKY/t9eCCECTghqgBlgG2pGkBBiFtKCEIEgEzV91or6jfkLM58uhICFD/wJYZbP\nMz6NdMwy1ADsROeA3/JEQEIIIYQQQgghhBBCCCGEEEIIIYQQQgghhHCn/x/ZHSCaSE1vqgAAAABJ\nRU5ErkJggg==\n",
       "text": [
        "<matplotlib.figure.Figure at 0x10755d990>"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print 'training scores: ', train_scores\n",
      "print 'testing scores: ', test_scores"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "training scores:  [ 0.06183333  0.279       0.99966667  1.        ]\n",
        "testing scores:  [ 0.04866667  0.162       0.74666667  0.05166667]\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For gamma < 1 we have underfitting. For gamma > 1 we have overfitting. So here, the best result is for gamma = 1 where we obtain a training an accuracy of 0.999 and a testing accuracy of about 0.75"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Grid Search"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If you take a closer look at the SVC class constructor parameters, we have other parameters, apart from gamma, that may also affect classifier performance. If we only adjust the gamma value, we implicitly state that the optimal C value is 1.0 (the default value that we did not explicitly set). Perhaps we could obtain better results with a new combination of C and gamma values. This opens a new degree of complexity; we should try all the parameter combinations and keep the better one.\n",
      "\n",
      "With GridSearchCV, we can specify a grid of any number of parameters and parameter values to traverse. It will train the classifier for each combination and obtain a cross-validation accuracy to evaluate each one.\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.grid_search import GridSearchCV\n",
      "\n",
      "parameters = {\n",
      "    'svc__gamma': np.logspace(-2, 1, 4),\n",
      "    'svc__C': np.logspace(-1, 1, 3),\n",
      "}\n",
      "\n",
      "clf = Pipeline([\n",
      "    ('vect', TfidfVectorizer(\n",
      "                stop_words=stop_words,\n",
      "                token_pattern=ur\"\\b[a-z0-9_\\-\\.]+[a-z][a-z0-9_\\-\\.]+\\b\",         \n",
      "    )),\n",
      "    ('svc', SVC()),\n",
      "])\n",
      "\n",
      "gs = GridSearchCV(clf, parameters, verbose=2, refit=False, cv=3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's execute our grid search and print the best parameter values and scores."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%time _ = gs.fit(X, y)\n",
      "\n",
      "gs.best_params_, gs.best_score_"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Fitting 3 folds for each of 12 candidates, totalling 36 fits\n",
        "[CV] svc__gamma=0.01, svc__C=0.1 .....................................\n",
        "[CV] ............................ svc__gamma=0.01, svc__C=0.1 -   8.4s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.01, svc__C=0.1 .....................................\n",
        "[CV] ............................ svc__gamma=0.01, svc__C=0.1 -   8.2s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.01, svc__C=0.1 .....................................\n",
        "[CV] ............................ svc__gamma=0.01, svc__C=0.1 -   8.3s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.1, svc__C=0.1 ......................................\n",
        "[CV] ............................. svc__gamma=0.1, svc__C=0.1 -   8.4s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.1, svc__C=0.1 ......................................\n",
        "[CV] ............................. svc__gamma=0.1, svc__C=0.1 -   8.8s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.1, svc__C=0.1 ......................................\n",
        "[CV] ............................. svc__gamma=0.1, svc__C=0.1 -   8.4s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=1.0, svc__C=0.1 ......................................\n",
        "[CV] ............................. svc__gamma=1.0, svc__C=0.1 -   8.4s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=1.0, svc__C=0.1 ......................................\n",
        "[CV] ............................. svc__gamma=1.0, svc__C=0.1 -   9.2s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=1.0, svc__C=0.1 ......................................\n",
        "[CV] ............................. svc__gamma=1.0, svc__C=0.1 -   9.0s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=10.0, svc__C=0.1 .....................................\n",
        "[CV] ............................ svc__gamma=10.0, svc__C=0.1 -   8.6s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=10.0, svc__C=0.1 .....................................\n",
        "[CV] ............................ svc__gamma=10.0, svc__C=0.1 -   8.9s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=10.0, svc__C=0.1 .....................................\n",
        "[CV] ............................ svc__gamma=10.0, svc__C=0.1 -   8.4s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.01, svc__C=1.0 .....................................\n",
        "[CV] ............................ svc__gamma=0.01, svc__C=1.0 -   8.2s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.01, svc__C=1.0 .....................................\n",
        "[CV] ............................ svc__gamma=0.01, svc__C=1.0 -   8.3s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.01, svc__C=1.0 .....................................\n",
        "[CV] ............................ svc__gamma=0.01, svc__C=1.0 -   8.6s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.1, svc__C=1.0 ......................................\n",
        "[CV] ............................. svc__gamma=0.1, svc__C=1.0 -   8.4s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.1, svc__C=1.0 ......................................\n",
        "[CV] ............................. svc__gamma=0.1, svc__C=1.0 -   8.4s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.1, svc__C=1.0 ......................................\n",
        "[CV] ............................. svc__gamma=0.1, svc__C=1.0 -   8.5s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=1.0, svc__C=1.0 ......................................\n",
        "[CV] ............................. svc__gamma=1.0, svc__C=1.0 -   8.6s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=1.0, svc__C=1.0 ......................................\n",
        "[CV] ............................. svc__gamma=1.0, svc__C=1.0 -   8.6s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=1.0, svc__C=1.0 ......................................\n",
        "[CV] ............................. svc__gamma=1.0, svc__C=1.0 -   8.6s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=10.0, svc__C=1.0 .....................................\n",
        "[CV] ............................ svc__gamma=10.0, svc__C=1.0 -   8.7s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=10.0, svc__C=1.0 .....................................\n",
        "[CV] ............................ svc__gamma=10.0, svc__C=1.0 -   8.9s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=10.0, svc__C=1.0 .....................................\n",
        "[CV] ............................ svc__gamma=10.0, svc__C=1.0 -   8.7s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.01, svc__C=10.0 ....................................\n",
        "[CV] ........................... svc__gamma=0.01, svc__C=10.0 -   8.7s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.01, svc__C=10.0 ....................................\n",
        "[CV] ........................... svc__gamma=0.01, svc__C=10.0 -   8.1s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.01, svc__C=10.0 ....................................\n",
        "[CV] ........................... svc__gamma=0.01, svc__C=10.0 -   8.3s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.1, svc__C=10.0 .....................................\n",
        "[CV] ............................ svc__gamma=0.1, svc__C=10.0 -   8.1s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.1, svc__C=10.0 .....................................\n",
        "[CV] ............................ svc__gamma=0.1, svc__C=10.0 -   8.0s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=0.1, svc__C=10.0 .....................................\n",
        "[CV] ............................ svc__gamma=0.1, svc__C=10.0 -   8.3s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=1.0, svc__C=10.0 .....................................\n",
        "[CV] ............................ svc__gamma=1.0, svc__C=10.0 -   8.6s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=1.0, svc__C=10.0 .....................................\n",
        "[CV] ............................ svc__gamma=1.0, svc__C=10.0 -   8.5s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=1.0, svc__C=10.0 .....................................\n",
        "[CV] ............................ svc__gamma=1.0, svc__C=10.0 -   8.4s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=10.0, svc__C=10.0 ....................................\n",
        "[CV] ........................... svc__gamma=10.0, svc__C=10.0 -   8.5s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=10.0, svc__C=10.0 ....................................\n",
        "[CV] ........................... svc__gamma=10.0, svc__C=10.0 -   8.5s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "[CV] svc__gamma=10.0, svc__C=10.0 ....................................\n",
        "[CV] ........................... svc__gamma=10.0, svc__C=10.0 -   8.6s"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    8.5s\n",
        "[Parallel(n_jobs=1)]: Done  36 out of  36 | elapsed:  5.1min finished\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "CPU times: user 5min 5s, sys: 1.17 s, total: 5min 6s\n",
        "Wall time: 5min 6s\n"
       ]
      },
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "({'svc__C': 10.0, 'svc__gamma': 0.10000000000000001}, 0.82666666666666666)"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "With the grid search we obtained a better combination of C and gamma parameters, for values 10.0 and 0.10 respectively, we obtained a 3-fold cross validation accuracy of 0.828 much better than the best value we obtained (0.76) in the previous experiment by only adjusting gamma and keeeping C value at 1.0."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We could continue trying to improve the results by also adjusting the vectorizer parameters in the grid search."
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Parallelizing"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Grid search calculation grows exponentially with each parameter and its possible values we want to tune. We could reduce our response time if we calculate each of the combinations in parallel instead of sequentially, as we have done. In our previous example, we had four different values for gamma and three different values for C, summing up 12 parameter combinations. Additionally, we also needed to train each combination three times (in a three-fold cross-validation), so we summed up\n",
      "36 trainings and evaluations. We could try to run these 36 tasks in parallel, since the tasks are independent.\n",
      "\n",
      "Most modern computers have multiple cores that can be used to run tasks in parallel. We also have a very useful tool within IPython, called IPython parallel, that allows us to run independent tasks in parallel, each task in a different core of our machine. Let's do that with our text classifier example."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "First we will declare a function that will persist all the K folds for the cross validation in different files. These files will be loaded by a process that will execute the corresponding fold:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.externals import joblib\n",
      "from sklearn.cross_validation import ShuffleSplit\n",
      "import os\n",
      "\n",
      "def persist_cv_splits(X, y, K=3, name='data', suffix=\"_cv_%03d.pkl\"):\n",
      "    \"\"\"Dump K folds to filesystem.\"\"\"\n",
      "    \n",
      "    cv_split_filenames = []\n",
      "    \n",
      "    # create KFold cross validation\n",
      "    cv = KFold(n_samples, K, shuffle=True, random_state=0)\n",
      "    \n",
      "    # iterate over the K folds\n",
      "    for i, (train, test) in enumerate(cv):\n",
      "        cv_fold = ([X[k] for k in train], y[train], [X[k] for k in test], y[test])\n",
      "        cv_split_filename = name + suffix % i\n",
      "        cv_split_filename = os.path.abspath(cv_split_filename)\n",
      "        joblib.dump(cv_fold, cv_split_filename)\n",
      "        cv_split_filenames.append(cv_split_filename)\n",
      "    \n",
      "    return cv_split_filenames"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 16
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cv_filenames = persist_cv_splits(X, y, name='news')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The following function loads a particular fold and fits the classifier with the specified parameters set. Finally returns the testing score. This function will be called by each of the parallel processes:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def compute_evaluation(cv_split_filename, clf, params):\n",
      "    \n",
      "    # All module imports should be executed in the worker namespace\n",
      "    from sklearn.externals import joblib\n",
      "\n",
      "    # load the fold training and testing partitions from the filesystem\n",
      "    X_train, y_train, X_test, y_test = joblib.load(\n",
      "        cv_split_filename, mmap_mode='c')\n",
      "    \n",
      "    clf.set_params(**params)\n",
      "    clf.fit(X_train, y_train)\n",
      "    test_score = clf.score(X_test, y_test)\n",
      "    return test_score"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This function executes the grid search in parallel processes. For each of the parameter combination (returned by the IterGrid iterator), it iterates over the K folds and creates a process to compute the evaluation. It returns the parameter combinations alongside with the tasks list: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.grid_search import ParameterGrid\n",
      "\n",
      "def parallel_grid_search(lb_view, clf, cv_split_filenames, param_grid):\n",
      "    \n",
      "    all_tasks = []\n",
      "    all_parameters = list(ParameterGrid(param_grid))\n",
      "    \n",
      "    # iterate over parameter combinations\n",
      "    for i, params in enumerate(all_parameters):\n",
      "        task_for_params = []\n",
      "        \n",
      "        # iterate over the K folds\n",
      "        for j, cv_split_filename in enumerate(cv_split_filenames):    \n",
      "            t = lb_view.apply(\n",
      "                compute_evaluation, cv_split_filename, clf, params)\n",
      "            task_for_params.append(t) \n",
      "        \n",
      "        all_tasks.append(task_for_params)\n",
      "        \n",
      "    return all_parameters, all_tasks"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 20
    },
    {
     "cell_type": "raw",
     "metadata": {},
     "source": [
      "Now we use IPython parallel to get the client and a load balanced view. We must first create a local cluster of N engines by using the Cluster tab in the IPython notebook. Then we create the client, the view and execute our parallel_grid_search function:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.svm import SVC\n",
      "from IPython.parallel import Client\n",
      "\n",
      "client = Client()\n",
      "lb_view = client.load_balanced_view()\n",
      "\n",
      "all_parameters, all_tasks = parallel_grid_search(\n",
      "   lb_view, clf, cv_filenames, parameters)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 21
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def print_progress(tasks):\n",
      "    progress = np.mean([task.ready() for task_group in tasks\n",
      "                                 for task in task_group])\n",
      "    print \"Tasks completed: {0}%\".format(100 * progress)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 22
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print_progress(all_tasks)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Tasks completed: 100.0%\n"
       ]
      }
     ],
     "prompt_number": 27
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def find_bests(all_parameters, all_tasks, n_top=5):\n",
      "    \"\"\"Compute the mean score of the completed tasks\"\"\"\n",
      "    mean_scores = []\n",
      "    \n",
      "    for param, task_group in zip(all_parameters, all_tasks):\n",
      "        scores = [t.get() for t in task_group if t.ready()]\n",
      "        if len(scores) == 0:\n",
      "            continue\n",
      "        mean_scores.append((np.mean(scores), param))\n",
      "                   \n",
      "    return sorted(mean_scores, reverse=True)[:n_top]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 28
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print find_bests(all_parameters, all_tasks)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[(0.82633333333333336, {'svc__gamma': 0.10000000000000001, 'svc__C': 10.0}), (0.78866666666666674, {'svc__gamma': 1.0, 'svc__C': 10.0}), (0.7466666666666667, {'svc__gamma': 1.0, 'svc__C': 1.0}), (0.23333333333333336, {'svc__gamma': 0.01, 'svc__C': 10.0}), (0.16200000000000001, {'svc__gamma': 0.10000000000000001, 'svc__C': 1.0})]\n"
       ]
      }
     ],
     "prompt_number": 29
    }
   ],
   "metadata": {}
  }
 ]
}